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Abstract 

Selection  is  a  fundamental  user  operation  in  3D  environments. 
These  environments  often  simulate  or  augment  the  real  world,  and  a 
part  of  that  simulation  is  the  ability  to  select  objects  for  observation 
and  manipulation.  Many  user  interfaces  for  these  applications  de¬ 
pend  on  six-degree-of-freedom  tracking  devices.  Such  devices  have 
limited  accuracy  and  are  susceptible  to  noise,  giving  an  imprecision 
that  makes  object  selections  difficult  and  hard  to  repeat.  This  diffi¬ 
culty  is  amplified  when  the  user’s  viewpoint  is  also  tracked,  mean¬ 
ing  the  user  must  compensate  for  noise  from  both  the  head  tracker 
and  the  pointing  device  when  performing  object  selection.  Also, 
users  may  experience  fatigue  when  using  handheld  pointing  devices 
for  extended  periods,  creating  error  even  if  the  tracking  technology 
were  perfect. 

This  paper  presents  a  pointing-based  probabilistic  selection  al¬ 
gorithm  that  addresses  some  of  the  ambiguities  associated  with 
tracking  and  user  imprecision.  It  performs  multiple  selections  by 
considering  a  frustum  along  the  user’s  pointing  direction  and  the 
hierarchical  structure  of  the  database.  It  assigns  probabilities  that 
the  user  has  selected  particular  objects  using  a  set  of  low-level  3D 
intersection-based  selection  techniques  and  the  relationship  of  the 
objects  in  a  hierarchical  database,  and  makes  the  final  selection  us¬ 
ing  one  of  several  weighting  schemes.  We  performed  several  exper¬ 
iments  to  evaluate  the  low-level  selection  techniques,  tested  several 
weighting  schemes  for  the  integration  algorithm,  and  we  show  that 
the  algorithm  is  effective  at  disambiguating  multiple  selections. 

CR  Categories:  H.5.2  [Information  Interfaces  and  Presenta¬ 
tion]:  User  Interfaces — Interaction  styles;  H.5.3  [Information  In¬ 
terfaces  and  Presentation]:  Group  and  Organizational  Interfaces — 
Computer-supported  cooperative  work; 

Keywords:  interaction,  selection,  algorithms,  augmented  reality, 
virtual  reality,  hierarchical  databases 

1  Introduction 

Many  virtual  and  augmented  reality  systems  present  the  user  with  a 
rendering  of  a  3D  world  containing  distinct  objects  that  the  user  can 
query  or  manipulate.  To  perform  these  actions  on  objects,  the  user 
usually  must  first  select  the  object.  While  there  are  many  ways  to 
select  objects,  pointing  at  the  desired  object  is  a  common  and  nat¬ 
ural  way  to  select.  Selection  by  pointing  can  happen  using  a  range 
of  devices,  front  a  common  2D  mouse  controlling  a  cursor  on  a  2D 
projection  of  the  3D  world,  to  a  full  six-degree-of-freedom  (DOF) 
hand-held  tracking  device.  Selection  can  also  happen  without  using 
the  hands  at  all,  by  allowing  the  user  to  select  using  head  orienta- 


*{gregory.schmidt,  dennis.g.brown,  erik.tomlin}@nrl. navy.mil 

t  baillot  @  ittid.com 

lswan@cse.msstate.edu 


tion  (assuming  the  head  is  tracked)  or  gaze  direction  using  an  eye 
tracker. 

We  assert  that  all  user  selection  operations  are  susceptible  to  er¬ 
ror.  First,  there  is  human  error:  the  imprecision  that  comes  front 
lack  of  experience,  not  enough  motor  control  to  do  fine  grained  se¬ 
lection,  or  fatigue  developed  during  a  session;  Wingrave  et  al.  [13] 
studied  a  number  of  correlations  between  certain  attributes  of  users 
and  their  ability  to  perform  selections.  Second,  there  is  equipment 
error,  which  could  be  noise,  drift,  and  lag  in  a  6DOF  tracking  sys¬ 
tem,  or  simply  not  enough  resolution  on  a  wheel-based  device  to 
perform  a  fine  selection.  Finally,  there  are  ambiguities  associated 
with  the  scene  itself,  such  as  when  the  user  tries  to  select  one  object 
occluded  by  another  object.  In  our  main  application  area,  mobile 
augmented  reality,  this  is  a  common  problem  because  users  have 
“x-ray  vision”  and  can  see  spatial  information,  such  as  the  position 
of  a  collaborator,  that  may  be  occluded  by  real  or  virtual  objects — 
in  this  example,  the  collaborator  may  be  behind  a  building.  These 
errors  can  lead  to  selections  that  are  totally  incorrect,  such  as  when 
using  a  ray-based  selection  that  chooses  a  single  object,  or  to  am¬ 
biguous  results  when  using  multiple  selection  techniques  that  can 
choose  many  candidate  objects. 

We  designed  a  pointing-based  probabilistic  selection  algorithm 
that  alleviates  some  of  the  error  in  user  selections.  This  technique 
takes  into  consideration  the  hierarchical  structure  of  the  scene  ob¬ 
jects  (e.g.,  a  door  is  a  child  of  a  wall,  which  is  a  child  of  a  building, 
and  so  on).  It  assigns  probabilities  that  the  user  has  selected  par¬ 
ticular  objects,  within  a  frustum  along  the  user’s  pointing  direction, 
using  a  set  of  low-level  3D  intersection-based  selection  techniques 
and  the  relationship  of  the  objects  in  a  hierarchical  database,  and 
makes  the  final  selection  using  one  of  several  weighting  schemes. 
We  implemented  this  algorithm  in  our  virtual  and  augmented  reality 
application  framework  and  performed  several  experiments  to  evalu¬ 
ate  the  low-level  selection  techniques,  to  evaluate  several  weighting 
schemes  for  the  integration  algorithm,  and  to  show  that  the  algo¬ 
rithm  can  effectively  disambiguate  multiple  selections. 

After  describing  related  work,  we  describe  the  low-level  selec¬ 
tion  techniques.  We  then  present  the  design  and  discuss  the  results 
of  the  experiments  described  above. 

2  Related  Work 

Selection  in  3D  environments  has  been  an  active  research  topic 
since  the  first  virtual  environments  were  implemented.  Hinck¬ 
ley  et  al.  [4]  presented  a  survey  of,  and  a  common  framework  for, 
techniques  for  3D  interaction,  until  that  point  in  time.  Liang  and 
Green  [8]  developed  the  spotlight  method  of  selection,  which  al¬ 
leviated  some  issues  with  using  ray-based  selection  for  small  and 
far  objects,  but  introduced  the  problem  of  multiple  selections,  for 
which  they  set  up  rules  to  choose  one  of  the  possible  selections. 
Mine  [9]  described  a  few  techniques  for  selection,  along  with  other 
interactions  to  be  supported  in  virtual  environments.  Forsberg  et 
al.  [3]  developed  two  novel  selection  techniques,  aperture  (an  ex¬ 
tension  of  spotlight)  and  orientation,  to  deal  with  the  imprecision 
of  ray-based  selection  using  a  6DOF  input  device.  Pierce  et  al.  [11] 
introduced  a  set  of  selection  techniques  using  tracked  hands  and  the 
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2D  projection  of  a  3D  scene.  In  his  thesis.  Bowman  [1]  gave  a  thor¬ 
ough  survey  of  3D  selection  techniques  at  the  time,  and  introduced 
a  novel  technique  for  selection  and  manipulation. 

In  more  recent  work,  researchers  have  dissected  the  task  of  se¬ 
lection  even  further,  and  have  created  novel  selection  techniques 
for  specific  application  domains  such  as  Augmented  Reality  (AR) 
and  multimodal  systems.  Wingrave  et  al.  [12]  discovered  that  users 
do  not  have  an  internal  model  of  how  they  expect  the  environment 
to  behave  (for  example,  in  performing  selections),  but  instead  they 
adapt  to  the  existing  environment  using  feedback  received  when 
performing  tasks.  Olwal  et  al.  [10]  developed  some  of  the  first  al¬ 
gorithms  for  selecting  in  AR.  Their  technique  attaches  a  virtual  vol¬ 
umetric  region  of  interest  to  parts  of  a  user’s  body.  When  the  user 
moves  the  body  part,  interacting  with  objects  in  the  environment, 
a  rich  set  of  statistical  data  is  generated  and  is  used  for  process¬ 
ing  selections.  Kolsch  et  al.  [7]  developed  a  real-time  hand  gesture 
recognition  system  that  can  act  as  the  sole  input  device  for  a  mo¬ 
bile  AR  system.  Kaiser  et  al.  [6]  developed  mutual  disambiguation 
techniques  and  evaluated  their  effectiveness  for  3D  multimodal  in¬ 
teraction  in  AR  and  Virtual  Reality  (VR).  They  showed  that  mutual 
disambiguation  accounts  for  over  45%  of  their  system’s  success¬ 
fully  recognized  multimodal  commands. 

We  acknowledge  that  multimodal  systems  are  an  effective  way 
to  alleviate  some  selection  issues  (for  example,  the  user  points  to  a 
group  of  objects  that  includes  a  window,  and  says  the  word  “win¬ 
dow.”  giving  the  system  the  means  to  disambiguate  the  selection), 
and  we  have  implemented  multimodal  (speech  and  pointing)  pro¬ 
cessing  in  our  system  for  disambiguating  different  types  of  objects 
(for  example,  windows,  walls,  and  buildings).  However,  if  there  is 
more  than  one  object  of  the  same  type  (for  example,  four  windows) 
in  the  selection  space,  then  the  system  described  above  will  fail 
since  the  utterance  “window”  is  ambiguous  and  does  not  help  cor¬ 
rect  a  wrong  selection.  In  this  example,  either  more  sophisticated 
speech  semantics  or  pointing-based  selection  processing  techniques 
are  needed. 

In  this  paper,  we  have  chosen  to  focus  solely  on  improving 
pointing-based  selection.  Our  selection  algorithm  introduces  the 
concept  of  executing  multiple  selection  techniques  in  parallel  and 
choosing  the  final  selection  from  the  results  of  those  techniques  us¬ 
ing  a  weighting  scheme.  The  low-level  techniques  and  the  integra¬ 
tion  algorithm  are  described  in  the  next  section. 


3  Probabilistic  Pointing-Based  Selection 
Algorithm 

Selection  by  pointing  in  3D  environments  is  inherently  imprecise 
when  the  user  is  allowed  to  select  occluded  objects — the  user  may 
have  the  impression  of  pointing  to  a  specific  object,  for  example, 
but  the  system  may  not  know  for  sure  which  object  in  the  pointing 
direction  is  meant  to  be  selected.  Users  often  make  pointing  errors, 
especially  when  selecting  small  objects,  objects  at  a  distance,  or 
when  trying  to  make  a  selection  quickly.  Furthermore,  pointing 
provides  the  object’s  direction,  but  not  distance,  so  when  several 
objects  lie  in  the  direction  the  user  is  pointing,  it  remains  unclear 
which  object  the  user  intended  to  select. 

To  deal  with  selection  ambiguity,  we  designed  a  probabilistic 
selection  algorithm  that  generates  lists  of  candidate  objects  the  user 
may  have  meant  to  select,  and  probability  estimates  of  how  likely 
it  is  the  user  meant  to  select  each  object.  The  algorithm  combines 
several  intersection  algorithms  and  the  hierarchical  structure  of  the 
dataset,  and  then  integrates  the  resulting  candidate  selections.  The 
processing  steps  of  the  algorithm  are  shown  in  Figure  1,  which  we 
describe  in  each  of  the  following  sections. 


Input 


Final  Selection 


Figure  1:  Flow  of  the  pointing-based  selection  algorithm. 

3.1  Frustum  Intersection  Algorithms 

We  have  designed  a  set  of  three  algorithms  which  attempt  to  mit¬ 
igate  the  ambiguity  associated  with  ray  intersection.  Each  algo¬ 
rithm  is  based  on  the  concept  of  rendering  the  scene  into  a  small 
selection  frustum  (see  Figure  2);  the  rendered  scene  is  viewed  by 
a  camera  pointing  coincident  to  the  selection  ray,  and  the  frustum 
is  rendered  into  an  off-screen  image  buffer.  The  algorithms  then 
count  and  classify  the  pixels  in  this  buffer,  and  use  these  counts  to 
create  a  list  (o\,p{), ....  of  potentially  selected  objects  o,- 

and  associated  selection  probabilities  p,. 

As  described  in  more  detail  below,  each  of  these  algorithms  has 
differing  utility  depending  on  the  user’s  preferences  for  making  se¬ 
lections,  on  what  type  of  object  the  user  is  trying  to  select,  and  on 
its  relationship  to  other  objects  in  the  scene.  We  have  designed 
the  three  intersection  algorithms  such  that  each  has  a  different  user 
preference  for  selection.  These  preferences  are:  (1)  select  the  item 
nearest  the  central  pointing  ray;  (2)  select  the  largest  item  in  the 
viewing  frustum;  and  (3)  select  using  a  combination  of  the  two 
other  approaches.  We  wanted  to  find  out  if  having  several  algo¬ 
rithms  available  based  on  different  user  preferences  increases  the 
chances  for  correctly  selecting  objects.  These  algorithms  could  ei¬ 
ther  be  used  individually  or  executed  in  parallel  and  their  results 
integrated  together. 

We  describe  each  intersection  algorithm  in  more  detail,  and  then 
show  how  their  output  lists  of  candidate  selections  are  integrated 
when  the  algorithms  are  run  in  parallel. 

Pixel-Count  The  Pixel-Count  algorithm  preferentially  orders 
objects  according  to  their  projected  size  in  the  selection  frus¬ 
tum.  PIXEL-COUNT  simply  counts  the  number  of  pixels  occupied 
by  each  object,  and  weighs  the  objects  accordingly.  This  pixel¬ 
counting  technique  is  a  very  fast  way  of  implementing  ordering  of 
objects  by  projected  size.  A  similar  technique  has  been  reported  by 
Olwal  [10], 

Pixel-Count 

Input :  3D  direction 

Output:  list  (»i  ,pi),  .  •  • ,  ( o„,pn )  of  candidate  objects 
Oj  and  associated  probabilities  p, 

1  calculate  a  small  frustum  about  3D  direction 

2  for  each  object  o,  in  the  frustum 

3  I  render  o into  the  frustum 


1  Washington  Building 

2  Kennedy  Slreel  \  'm 

3  Lincoln  Building  \ 

la 


Figure  3:  The  operation  of  the  BARYCENTRIC-PIXEL-COUNT  algo¬ 
rithm.  Because  d\  <d2,  each  pixel  of  object  1  will  be  weighted  more 
heavily  than  the  pixels  of  object  2. 


Figure  2:  The  operation  of  the  PIXEL-COUNT  algorithm.  The  scene 
is  rendered  into  the  small  selection  frustum  shown  in  the  center  of 
the  image. 


10  assign  probabilities  p;  from  weights  w; 

11  sort  ( Oi,pi )  list  by  decreasing  probabilities  p,- 


4  piXj  <—  number  of  pixels  covered  by  o; 

5  weights  w,  <—  pixJ total-frustum-pixels 

6  assign  probabilities  pi  from  weights  w,- 

7  sort  (i Oi,pi )  list  by  decreasing  probabilities  p; 


Barycentric-Pixel-Count  works  very  well  for  selecting 
small  objects  near  larger  objects,  but  it  does  not  work  well  if  the 
user  points  away  from  the  center  of  an  object,  or  if  the  object  has  a 
shape  such  that  the  Barycentric  center  does  not  lie  within  the  object 
itself. 


Figure  2  demonstrates  PIXEL-COUNT.  The  green  square  in  the 
center  of  the  image  demonstrates  one  size  of  the  selection  frustum; 
in  the  lower-right  the  frustum  contents  are  enlarged.  Note  that  the 
frustum  size  may  be  adjusted  by  the  user.  The  square  in  the  lower 
left  comer  shows  a  low-resolution  re-rendering  of  the  frustum  con¬ 
tents. 

The  PIXEL-COUNT  algorithm  is  robust  to  noise  and  pointing 
ambiguity.  However,  it  inherently  assumes  the  user  is  attempting  to 
select  larger  objects,  and  it  does  not  work  well  for  selecting  small 
objects  near  larger  objects. 

Barycentric-Pixel-Count  This  algorithm  was  motivated  by  our  ob¬ 
servation  that  users  tend  to  point  toward  the  center  of  the  visible 
part  of  the  object  they  wish  to  select.  Figure  3  describes  how 
Barycentric-Pixel-Count  operates.  The  algorithm  calculates 
the  center  point  of  the  visible  portion  of  each  object  (Oj  and  O^), 
and  then  determines  the  distance  to  the  center  of  the  selection  frus¬ 
tum  (d i  and  df).  It  then  weighs  each  object’s  pixels  with  the  in¬ 
verse  of  this  distance;  so  in  Figure  3  object  l’s  pixels  are  weighted 
by  l/d\  and  object  2’s  pixels  by  1  /c/2-  Since  it  is  assumed  that  the 
user  is  intending  to  look  at  one  object,  probabilities  are  estimated 
by  normalizing  the  weights  across  all  of  the  weighted  pixels. 


Barycentric-Pixel-Count 
Input :  3D  direction 

Output,  list  (o\ ,p\), .  ■  ■ ,  ( 0,1, pn )  of  candidate  objects 
Oj  and  associated  probabilities  p; 

1  calculate  a  small  frustum  about  3D  direction 

2  let  Fc  be  the  center  of  the  frustum 

3  for  each  object  o ;  in  the  frustum 

4  let  O;  be  the  center  of  the  visible  portion  of  o,- 

5  bary-weight  <—  1/||FC  —  Of  || 

6  render  0,  into  the  frustum 

7  for  each  pixel  a  generated  by  0/ 

8  |_  piXj  <—  piXj  +  a* bary-weight 

9  weights  w,  <—  pixJ  total- frustum-pixels 


Gaussian-Pixel-Count  The  Gaussian-Pixel-Count  algorithm 
is  also  motivated  by  the  general  observation  that  users  tend  to  center 
the  objects  they  want  to  select.  However,  this  algorithm  tries  to  ad¬ 
dress  the  failing  of  the  BARYCENTRIC-PIXEL-COUNT  algorithm, 
which  occurs  when  the  Barycentric  center  does  not  lie  within  the 
object  itself.  GAUSSIAN-PIXEL-COUNT  operates  by  applying  a 
Gaussian  mask,  centered  in  the  selection  fmstum,  to  each  object’s 
pixels.  The  mask  operates,  in  effect,  by  assigning  weights  to  each 
pixel  based  on  its  distance  from  the  center  ray  according  to  a  Gaus¬ 
sian  bell  curve.  Figure  4  describes  how  GAUSSIAN-PIXEL-COUNT 
operates.  The  filtered  output  for  each  individual  object  is  combined 
in  an  accumulation  buffer.  Probabilities  are  assigned,  again,  assum¬ 
ing  one  object  is  intended  to  be  selected,  by  normalizing  across  the 
weighted  pixels. 


Gaussian-Pixel-Count 
Input  3D  direction 

Output,  list  (01 , pi ) , . . . ,  ( on,pn )  of  candidate  objects 
Oi  and  associated  probabilities  p ; 

1  calculate  a  small  frustum  about  3D  direction 

2  calculate  a  Gaussian  filter  G  centered  in  frustum 

3  for  each  object  o,  in  the  frustum 

4  render  o,-  into  the  frustum 

5  for  each  pixel  a  generated  by  0; 

6  |_  piXj  <—  piXj  +  a* G 

7  weight  Wi  <—  pixJ total-weighted-frustum-pixels 

8  assign  probabilities  p;  from  weights  Wj 

9  sort  ( Oi,pj )  list  by  decreasing  probabilities 


The  algorithm  is  less  susceptible  to  being  biased  by  large  visible 
objects  and  it  favors  selecting  objects  near  the  central  viewing  ray. 


3.2  Probability  Propagation  in  Hierachical  Database 

The  probability  estimates  generated  by  the  ray  intersection  al¬ 
gorithms  assume  that  a  single  object  occupies  a  given  space  in 


selection  frustum 


object  2 


Figure  4:  The  operation  of  the  GAUSSIAN-PIXEL-COUNT  algorithm. 
Pixels  in  objects  1  and  2  are  weighted  by  a  circularly  symmetric 
Gaussian  function  centered  at  Fc. 


world 


city  •  •  • 


building  street  sign  •  •  • 
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glass  screen  •  •  •  glass  screen  •  •  • 

Figure  5:  A  portion  of  a  hierarchically-organized  database  that  con¬ 
tains  urban  data  such  as  buildings,  streets,  signs,  etc. 


the  viewing  frustum.  This  assumption  is  not  the  case  for  a 
hierarchically-organized  database,  which  contains  objects  com¬ 
posed  of  smaller  objects,  which  in  turn,  may  be  composed  of  even 
smaller  objects,  and  so  forth.  This  type  of  database  can  have  sev¬ 
eral  objects  occupying  a  given  space,  though  there  will  always  be 
a  relationship  between  the  occupying  objects.  An  example  of  a 
hierarchically-organized  database,  which  we  use  in  our  testing,  and 
the  inter-relationships  between  objects,  is  illustrated  in  Figure  5. 

In  order  to  assign  probabilities  properly  for  a  hierarchically- 
organized  database,  we  (1)  set  the  ray-intersection  algorithms  to 
probabilities  for  the  lowest  level  structure  for  each  pixel,  and  (2) 
propagate  the  probabilities  up  the  tree  hierarchy  from  the  leaf 
nodes.  Since  our  ray  intersection  algorithm  returns  the  lowest-level 
structures,  we  use  the  following  algorithm  to  propagate  probabili¬ 
ties  up  the  database  hierarchy: 


Probability-Propagation 
Input,  old  list  L0  =  (o  i ,  /?  i ) , . . . ,  (o„ ,  pn )  of 

objects  o,  and  associated  probabilities  pi 


Output:  new  listLjv  =  (o\,pi), . .  ■ ,  (om,pm) 


1 

2 

3 

4 

5 

6 

7 

8 
9 


create  empty  list  Zyy 
for  each  object  o,  in  Lq 
if  Oi  not  in  /J;v  then 
add  Oi  to  Ljy 
Oi. weight  <—  pi 

for  each  recursive  parent  of  o, 
if  Oj. parent  not  in  Ly  then 
[add  Oi. parent  to  Lfj 
0,. parent. weight  <—  Oj. parent. weight  +  pi 


10  for  each  object  o,-  in  L # 


11 

12 


normalize  Oj. weight 

assign  probability  p;  from  Oj. weight 


13  sort  Zyy  by  decreasing  probabilities 


One  subtlety  to  note  is  in  line  5,  where  we  consider  the  prob¬ 
ability  for  each  pair  as  a  “weight”  for  that  pair,  since  we  perform 
operations  with  these  values  that  do  not  strictly  consider  the  val¬ 
ues  as  probabilities.  The  resulting  probability  assignments  estimate 
likelihood  that  any  of  the  occupying  objects  for  a  given  space  is  the 
desired  selection.  For  example,  using  the  hierarchy  in  Figure  5,  for 
a  ray  intersecting  a  window,  its  probability  of  selection  is  equal  to 
the  probability  of  selecting  the  wall,  building,  city,  and  world,  at 
the  intersecting  pixel.  Another  property  to  understand  using  this 
probability  propagation  approach  is  the  probabilities  for  the  parents 
are  always  at  least  as  much  as  the  probabilities  for  any  child.  Keep 
in  mind,  another  reasonable  and  equally  valid  manner  of  assigning 
probabilities  is  to  consider  the  hierarchical  nature  of  the  database 
at  a  lower  level,  when  the  ray  intersection  algorithms  estimate  the 
probabilities.  This  approach  may  lead  to  different  but  still  similar 
assignments  of  probabilities. 


3.3  Integration  of  Probability  Assignments 

The  three  lists  of  objects  and  associated  probabilities  generated  by 
the  ray  intersection  algorithms  and  probability  propagation  algo¬ 
rithm  need  to  be  combined  into  one  list.  One  caveat  to  this  pro¬ 
cess  is  that  each  list  may  contain  a  slightly  different  set  of  object- 
probability  pairs  due  to  differences  in  the  how  each  algorithm  op¬ 
erates.  Thus,  the  elements  of  the  lists  will  have  to  be  matched  and 
like  items  combined.  Another  important  note  is  that,  in  the  process 
described  below,  just  as  in  the  probability  propagation  algorithm, 
we  consider  the  probability  for  each  pair  as  a  “weight”  for  that  pair, 
since  we  perform  operations  with  these  values  that  do  not  strictly 
consider  the  values  as  probabilities.  A  naive  integration  approach 
would  be,  for  each  object,  to  simply  average  the  weights  assigned  to 
that  object  from  the  different  algorithms.  This  approach,  however, 
does  not  take  into  consideration  the  strengths  and  weaknesses  of 
each  of  the  three  algorithms.  A  more  appropriate  way  to  integrate 
them  is  to  assign  a  weight  to  each  algorithm,  Wi,  based  on  how 
well  each  performs  in  comparison  to  the  others.  The  lists  are  then 
integrated  by  the  following  WEIGHTED-INTEGRATION  algorithm. 


Weighted-Integration 
Input:  3  lists  Lc,  LB ,  LP  =  (pi, p\ (o„,p„)  of 
objects  Oj  and  associated  probabilities  p; 
Output:  new  list  LN  —  (o\,p\ ),...,  (om,pm) 


create  empty  list  L n 
for  each  list  Lj  in  Lq,Lb,Lp 
for  each  object  o,-  in  Lj 
if  Oi  not  in  Ay  then 
|  add  pair  to  Zyy 
else 

[find  Oj  in  L # 

Lfi/.Oi.weight  <—  p;  *  Wj 
for  each  object  o,-  in  L v 
normalize  Oj. weight 
assign  probability  p;  from  o,. weight 


The  integration  weights,  W; ,  i  =  G,B,P,  corresponding  to  the  in¬ 
tersection  algorithms  GAUSSIAN-PIXEL-COUNT,  BARYCENTRIC- 
PlXEL- Count,  and  Pixel-Count,  respectively,  are  initially  ar¬ 
bitrarily  assigned  to  ^  each  (giving  the  same  effect  as  the  naive 


method  of  averaging).  However,  we  acknowledge  that  having 
proper  weight  assignments  for  data  integration  is  important  for  op¬ 
timizing  performance  and  is  a  difficult  task.  We  made  several  at¬ 
tempts  to  refine  the  weight  assignments.  We  used  the  performance 
estimates  of  the  intersection  algorithms  to  influence  the  assignment 
of  the  weights  in  the  integration.  In  one  case,  we  normalized  the  al¬ 
gorithm  performance  estimates  and  used  those  as  the  weight  values. 
For  the  other  case,  we  used  the  normalized  performance  estimates 
as  a  guide  to  refine  the  weight  assignments.  More  details  about  the 
assignment  of  weights  is  given  in  Section  4. 

4  Performance  Evaluation 

We  conducted  several  experiments  to  gain  a  better  understanding 
of  the  effectiveness  of  the  algorithms  for  disambiguating  multiple 
pointing  selections.  We  approached  this  task  by  first  comparing 
empirically  the  three  intersection  algorithms  head-to-head  to  learn 
the  strengths  and  weaknesses  of  each  algorithm  for  specific  dataset 
cases.  Second,  we  evaluated  the  integration  algorithm,  exploring 
several  weighting  schemes,  to  determine  how  best  to  utilize  com¬ 
binations  of  the  intersection  algorithms  in  parallel.  Lastly,  we  con¬ 
ducted  a  short  experiment  to  demonstrate  and  empirically  evaluate 
how  well  the  algorithms  work  for  disambiguating  selections. 

4.1  Comparison  of  Intersection  Algorithms 

We  conducted  three  experiments  to  compare  the  three  intersection 
algorithms  head-to-head.  The  first  experiment  was  designed  to 
test  the  experimental  protocol,  flushing  out  any  design  and  test¬ 
ing  issues,  and  used  a  simple  real-world  urban  dataset  and  a  few 
test  cases.  Each  successive  experiment  increased  the  detail  of  the 
datasets  and  complexity  of  the  test  cases.  Comparing  any  algo¬ 
rithm  thoroughly  typically  requires  running  an  extensive  set  of  ex¬ 
periments  with  multiple  datasets  and  conditions.  The  goal  of  these 
experiments  was  to  get  a  general  feel  for  the  accuracy  of  each  algo¬ 
rithm  for  performing  selection  using  a  real-world  urban  dataset. 

Since  our  overall  goal  is  to  apply  these  algorithms  for  disam¬ 
biguating  multiple  selections,  we  acknowledge  that  precision  plays 
a  role  in  how  we  evaluate  the  algorithms.  In  particular,  the  selec¬ 
tion  cases  we  address  only  become  interesting  when  high  preci¬ 
sion  is  required,  otherwise  the  selections  could  be  easily  performed 
using  well-known  ray-based  pointing  techniques.  We  ran  a  set  of 
preliminary  tests  to  evaluate  the  degree  of  precision  needed  for  the 
experiments  and  tested  the  selection  techniques  on  a  range  of  sizes 
for  objects  and  varying  amounts  of  space  between  each.  Some  of 
the  test  cases  are  shown  in  Figure  6.  We  used  the  precision  es¬ 
timates  to  guide  our  choice  of  the  test  cases  for  the  experiments, 
making  sure  that  the  required  degree  of  precision  was  high,  but  low 
enough  to  collect  meaningful  data  to  compare  the  three  algorithms. 
Later,  when  we  tested  the  algorithms  for  disambiguating  selections 
of  smaller,  distant  objects,  we  increased  the  degree  of  precision  re¬ 
quired.  We  next  describe  the  experiment  preparations,  experimental 
protocol,  and  each  experiment. 

Experiment  Preparation  We  prepared  for  the  experiments  by  im¬ 
plementing  the  algorithms  and  user  interface  in  a  combined  AR 
and  VR  system  our  laboratory  has  been  developing  over  the  last 
several  years.  The  two  major  building  blocks  of  the  combined 
system  are  the  Battlefield  Augmented  Reality  System  (BARS)  [5], 
which  provides  support  for  3D  in  the  form  of  VR  and  AR  (with 
special  emphasis  on  mobile  AR  and  multi-wall  VR),  and  Quick- 
set  [2],  which  provides  support  for  2D  map-based  interaction  us¬ 
ing  an  agent-based  architecture  that  supports  probabilistic,  asyn¬ 
chronous  input  events.  The  system  has  evolved  extensively  over 
the  last  several  years  to  support  a  large  number  of  input  and  display 
devices,  interaction  and  display  algorithms,  as  well  as  the  interface 
techniques  and  associated  algorithms  described  in  this  paper.  The 
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Figure  6:  Three  of  the  test  cases  used  to  determine  the  degree  of 
precision  needed  by  the  experiments.  We  varied  the  size  of  several 
windows,  and  the  spacing  between  each,  and  asked  several  users  to 
perform  selections  of  the  windows  for  each  test  case.  Statistics  were 
collected  to  determine  the  precision  required  for  our  experiments. 

major  components  we  added  are  the  pointing-based  user  interface, 
the  associated  selection  algorithms,  a  speech  recognition  system, 
and  the  Quickset  multimodal  speech-and-gesture  integrator. 

Experimental  Protocol  For  each  of  the  three  experiments,  we  re¬ 
cruited  two  to  four  subjects  from  our  development  team.  Each  sub¬ 
ject  wore  a  tracked,  see-through  head-mounted  display  (HMD)  that 
included  a  microphone  for  the  speech  recognizer.  The  orientation 
of  the  head  was  used  as  the  pointing  direction.  The  system  showed 
a  cross-hair  cursor  at  the  center  of  the  view  frustum  for  the  pointing 
direction.  The  view  frustum’s  borders  were  made  invisible  to  the 
user  and  its  size  was  kept  fixed  throughout  the  experiments.  The 
experiments  operated  under  the  assumption  that  the  user’s  selection 
strategy  was  to  point  directly  through  the  center  of  the  object  in¬ 
tended  to  be  selected.  Other  strategies,  such  as  selecting  the  largest 
item  in  the  view  frustum,  are  evaluated  later  in  Section  4.3. 

Each  subject  was  given  a  training  session  to  become  familiar 
with  the  pointing  and  speech  input  interaction  mechanism.  The 
subjects  were  prompted  by  the  experiment  administrator  to  perform 
selections  using  head  direction  combined  with  a  voice  command. 
For  example,  the  subject  is  asked  to  point  at  the  upper  right  window 
of  the  left  building,  shown  in  Figure  6,  while  speaking  “secure  this 
window.”  The  system  responds  by  changing  the  color,  based  on  the 
voice  command,  of  what  the  system  determines  to  be  the  correct  se¬ 
lection.  In  Figure  6,  the  verb  “clear”  triggers  a  change  in  the  color 
of  the  upper  right  window  to  green. 

The  trials  of  each  experiment  asked  the  subjects  to  perform  a  se¬ 
quence  of  selections  over  a  range  of  test  cases.  The  selection  test 
cases  presented  were  sets  of  windows,  doors,  walls,  and  buildings. 
The  speech  actions  included  “danger  on  this  object,”  “clear  this  ob¬ 
ject,”  and  “reset  this  object”  —  these  commands  are  meaningful  in 
the  urban  situational  awareness  domain  of  the  BARS  system  [5], 
Data  were  collected  for  the  three  frustum-based  algorithms.  Cases 
where  the  speech  recognizer  failed  were  thrown  out,  since  we  are 
not  evaluating  the  speech  recognizer  nor  the  multimodal  features 
of  the  system.  The  test  cases  were  presented  in  a  counter-balanced 
manner  in  order  to  eliminate  any  learning-effect  biases. 

Experiment  1  The  first  experiment  used  a  small  subset  of  the 
dataset  shown  in  Figure  7.  The  dataset  contains  semantic  in¬ 
formation  about  the  buildings  surrounding  our  laboratory — it  is  a 
real-world  database  used  by  our  system  for  various  development, 
demonstration,  and  evaluation  purposes.  We  ran  the  experiment, 
following  the  experimental  protocol  above,  using  two  subjects. 
Each  subject  was  asked  to  perform  selections  of  windows  and  doors 
in  three  different  buildings.  We  collected  statistics  for  12  different 
cases  per  user,  for  a  total  of  24  cases.  We  recorded  the  accuracy  of 
the  intersection  algorithms;  we  recorded  a  boolean  value  of  ‘1’  if  an 
algorithm  made  a  correct  selection,  and  a  ‘0’  if  it  made  an  incorrect 
selection.  Figure  8  shows  the  accuracy  performance  of  each  of  the 


Figure  7:  The  dataset  used  in  Experiments  1-3.  Progressively  smaller  subsets  of  the  dataset,  with  different  test  cases,  were  used  in  Experiments  1 
and  2.  (Left)  Ground-level  views;  these  are  the  views  that  subjects  saw  during  the  experiments.  (Right)  Elevated  view;  given  to  give  the  reader 
a  feel  for  the  dataset's  layout. 


Figure  8:  The  accuracy  (mean  percentage  of  correct  selections)  given 
by  the  three  intersection  algorithms  in  Experiments  1—3. 


three  algorithms  for  Experiment  1. 

Experiment  2  The  second  experiment  expanded  on  the  first  experi¬ 
ment,  this  time  using  a  larger  dataset  and  a  broader  set  of  test  cases. 
As  before,  the  dataset  chosen  is  a  subset  of  Figure  7’s  dataset.  We 
only  presented  the  ground-level  view  of  the  dataset  to  the  subjects. 
The  test  cases  were  determined  based  on  observed  strengths  and 
weaknesses  of  the  three  intersection  algorithms.  We  designed  the 
test  protocol  such  that  the  subjects  would  make  selections  for  situ¬ 
ations  where  each  of  the  algorithms  is  weak,  and  cases  where  each 
is  strong;  we  tested  an  equal  number  of  “good”  and  “bad”  test  cases 
for  each  algorithm.  We  used  four  subjects  and  collected  the  per¬ 
formance  statistics  for  192  selection  cases  (48  per  subject).  The 
users  were  asked  to  make  selections  of  windows  and  buildings.  We 
recorded  the  accuracy  performance  for  each  of  the  three  algorithms; 
the  results  for  Experiment  2  are  shown  in  Figure  8. 

Experiment  3  The  third  experiment  used  the  largest  portion  of  our 
test  dataset  (Figure  7),  as  well  as  the  broadest  set  of  test  cases, 
which  we  believe  better  explored  the  strengths  and  weaknesses  of 
the  intersection  algorithms.  This  experiment  followed  the  same  pro¬ 
tocol  as  Experiment  2,  with  three  subjects  and  144  selection  cases 
(48  per  subject),  and  an  equal  amount  of  good  and  bad  test  cases 
for  each  algorithm.  We  again  recorded  the  accuracy  performance 
for  each  of  the  three  algorithms;  the  results  are  shown  in  Figure  8. 

Results  and  Discussion  We  analyzed  the  accuracy  performance  re¬ 
sults  with  a  one-way  analysis  of  variance  (ANOVA).  In  addition  to 
the  standard  p-values,  the  standard  measure  of  effect  significance, 
we  calculated  and  report  or,  a  standard  measure  of  effect  size,  co 2  is 
an  approximate  measure  of  the  percentage  of  the  observed  variance 
that  can  be  explained  by  the  effect. 

As  suggested  by  Figure  8,  we  found  a  strong  effect  of  algo¬ 
rithm  for  each  of  the  experiments  (Experiment  1:  F( 2,69)  =  5.07, 
p  =  .009,  co2  =  10.3%;  Experiment  2:  F{ 2,573)  =  20.7,  p  <  .000, 
CO2  =  6.4%;  Experiment  3:  F(2,429)  =  12.7,  p  <  .000,  co2  = 
5.2%).  The  error  bars  in  Figure  8,  which  show  ±1  standard  error, 
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Figure  9:  The  effect  of  BARYCENTRIC-PIXEL-COUNT  on  selection 
involving  a  concave  shape.  The  blue  regions  work  properly,  while  the 
red  fail.  Empirical  results  show  regions  B  and  C  fail  85%  of  the  time. 

indicate  that  BARYCENTRIC-PIXEL-COUNT  had  greater  accuracy 
than  both  GAUSSIAN-PIXEL-COUNT  and  PIXEL-COUNT  for  all 
three  experiments.  For  Experiments  2  and  3,  GAUSSIAN-PlXEL- 
COUNT  had  greater  accuracy  than  PIXEL-COUNT. 

This  analysis  empirically  validates  that  the  choice  of  intersection 
algorithm  makes  a  difference  in  selection  accuracy.  Approaches 
similar  to  what  we  are  calling  Barycentric  have  been  presented  as 
Eiang  and  Green’s  spotlight  method  [8]  and  the  aperture  method  of 
Forsberg  et  al.  [3],  but  here  we  present  the  first  empirical  evidence 
for  the  general  effectiveness  of  this  technique. 

However,  although  the  data  show  that  the  BARYCENTRIC- 
PlXEL- Count  algorithm  clearly  outperforms  the  other  algorithms 
(for  this  choice  of  dataset  and  degree  of  pointing  precision),  we  can 
observe  some  interesting  test  cases  if  we  allow  the  user  to  view  the 
same  dataset  from  above  the  buildings.  For  example,  we  observed 
that  concave  shape  patterns  show  up  more  often  looking  from  above 
than  from  the  ground-level  views — see  Figure  9.  One  weakness  of 
the  Barycentric-Pixel-Count  algorithm  is  how  it  operates  on 
concave  objects.  The  algorithm  relies  on  an  estimation  of  the  center 
of  the  objects  trying  to  be  selected.  As  seen  in  Figure  9.  one  esti¬ 
mation  of  the  center  of  Building  1  is  located  at  the  position  where 
the  ‘o’  is  shown.  Due  to  the  complex  nature  of  the  behavior  of  the 
algorithm’s  weighting  function,  we  decided  to  empirically  evaluate 
the  test  case.  We  ran  a  quick  study  collecting  statistics  for  how  well 
different  regions  of  the  buildings  could  be  selected  properly  using 
the  Barycentric-Pixel-Count  algorithm.  Pointing  in  the  re¬ 
gions  shown  in  blue  properly  select  the  correct  building,  while  the 
red  regions  fail.  Empirical  results  show  regions  ‘B’  and  ‘C’  fail 
85%  of  the  time. 

4.2  Evaluation  of  Integration 

We  conducted  three  experiments  to  evaluate  different  weighting 
schemes  for  the  integration  algorithm.  Our  objective  was  to  deter¬ 
mine  how  to  best  utilize  different  combinations  of  the  intersection 
algorithms,  including  determining  when  to  use  an  intersection  al¬ 
gorithm  by  itself  or  when  to  combine  the  algorithms  in  parallel.  We 
focused  mainly  on  finding  an  optimal  weight  assignment  for  the 
typical  use  case  and  compared  it  against  the  best  intersection  al¬ 
gorithm  (Barycentric-Pixel-Count).  In  each  experiment,  we 
used  the  datasets  and  experimental  protocol  from  the  experiments  in 
Section  4.1.  We  applied  several  different  weighting  schemes  (de- 
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Figure  10:  The  accuracy  of  the  integration  schemes  versus 
the  BARYCENTRIC-PIXEL-COUNT  intersection  algorithm  in  Experi¬ 
ments  1—3. 

scribed  below)  to  the  integration  algorithm,  and  for  each  scheme, 
ran  an  experiment  and  assessed  the  performance  of  the  integration. 
We  show  the  weight  assignments  for  each  experiment  in  Table  1 
and  the  performance  results  of  the  integrations  in  Figure  10. 

Table  1:  Weight  assignments  for  experiments. 


Exp 

Scheme 

WP 

Wp 

wG 

1 

Equal  Weighting 

.3333 

.3333 

.3333 

1 

Performance-Proportional 

.3103 

.4138 

.2759 

1 

Performance-Differential 

.3031 

.4238 

.2731 

2 

Equal  Weighting 

.3333 

.3333 

.3333 

3 

Equal  Weighting 

.3333 

.3333 

.3333 

3 

Performance-Proportional 

.28 

.38 

.34 

3 

2-Parm-Search  (Scheme  A) 

.1894 

.4688 

.3418 

3 

2-Parm-Search  (Scheme  B) 

.1138 

.5400 

.3462 

3 

Adhoc  (Scheme  C) 

.1000 

.5000 

.4000 

3 

Majority  Voting  (Scheme  D) 

n/a 

n/a 

n/a 

Weighting  Schemes  The  weighting  schemes  listed  in  Table  1  and 
Figure  10  are: 

•  Equal  Weighting:  Simply  assign  the  weights  Wp  =  Wp  = 

wG  =  l 

•  Performance-Proportional  Weighting:  Assign  the  weights 
Wj  =  norm(Aj),  for  i  =  P.B,G,  where  A ;  are  the  accuracy 
estimates  for  r'-th  intersection  algorithm. 

•  Performance-Differential  Weighting:  Assign  the  weights 

Wi  =  sum(Pfjrrec'  _p'^tJ^hes,)/,h  for  ;  =  for  j  = 

1  ..n,  where  Pcorrect  and  pne*> -highest  are  the  probabilities  of 
the  correct  and  next  highest  selection,  n  is  the  number  of  trials 
used  to  estimate  the  performance  of  algorithm  i.  Negative 
differences  can  be  assigned  zero. 

•  Two-Parameter-Search  Weighting:  Start  with  an  estimate 
for  one  of  the  weights  (assume  weight  W\).  Next,  use  per¬ 
formance  estimates  of  the  algorithms  to  compute  the  ratio 
r  =  a'ZaI  >  where  Ai,A2,  andA3  are  the  respective  algorithm 
accuracies.  Compute  the  weights  by  solving  the  two  equa¬ 
tions:  Wx  +  W2  +  W3  =  1  and  Wx  -  W2  =  r(Wx  -  W3). 

•  Adhoc  Weighting:  We  assigned  the  weights  by  hand,  based 
on  observed  trends  in  the  performance  of  the  other  weighting 
schemes. 


•  Majority  Voting:  Select  the  candidate  object  that  has  the 
majority  of  the  votes  across  the  intersection  algorithms.  Em¬ 
pirical  data,  when  using  the  equal  weighting  scheme,  shows: 
(a)  if  all  three  vote  for  one  object,  the  integration  always  votes 
for  that  object;  and  (b)  if  two  of  the  three  vote  for  the  same 
object,  the  integration  algorithm  selects  the  same  object  a  ma¬ 
jority  of  the  time. 

Results  and  Discussion  We  again  analyzed  the  accuracy  perfor¬ 
mance  with  a  one-way  ANOVA.  We  found  a  strong  effect  of  al¬ 
gorithm/integration  scheme  for  Experiment  1  (F(3,116)  =  4.37, 
p  =  .006.  co2  =  7.8%)  and  Experiment  2  (F(l,382)  =  10.8,  p  = 
.001,  co2  =  2.5%).  In  addition,  we  found  an  effect  for  Experiment  3 
(F(6,809)  =  2.17,  p  =  .044,  to2  =  .85%),  and  while  this  effect  is 
significant,  it  is  a  substantially  weaker  effect  than  the  others  re¬ 
ported  in  this  paper.  The  error  bars  indicate  that  in  Experiment  1, 
performance  breaks  down  into  two  groups:  (1)  the  Barycentric  in¬ 
tersection  algorithm  and  the  Performance-Proportional  integration 
scheme,  and  (2)  the  Equal  Weighted  and  Performance-Differential 
schemes,  with  group  (1)  performing  better  than  group  (2).  In  Ex¬ 
periment  2,  the  Barycentric  algorithm  performed  better  than  the 
Equal  Weighted  scheme.  We  only  tested  one  integration  scheme 
in  this  experiment  because  we  decided  it  would  be  more  interesting 
to  look  at  test  cases  where  the  best  intersection  algorithm  (Barycen¬ 
tric)  had  a  lower  global  effectiveness,  which  motivated  the  next  ex¬ 
periment.  In  Experiment  3,  the  Equal  Weighted  and  Performance- 
Proportional  schemes  appear  worse  than  the  others,  but  otherwise 
the  performance  of  all  the  schemes  (and  the  Barycentric  algorithm) 
is  comparable.  This  relative  equality  is  why  the  effect  significance 
and  size  is  substantially  smaller  for  Experiment  3. 

4.3  Evaluation  of  Disambiguation 

The  overall  goal  of  our  research  was  to  develop  methods  for  dis¬ 
ambiguating  multiple  selections.  We  made  strides  towards  this  goal 
by  developing  several  intersection  algorithms  and  an  integration  al¬ 
gorithm  that  can  combine  the  algorithms  in  many  ways.  The  com¬ 
bination  can  be  automatic,  by  using  one  of  the  schemes  described 
previously,  or  it  can  be  manual,  by  allowing  the  user  to  explicitly 
set  the  weights  in  the  integration  algorithm — for  example,  to  only 
use  the  BARYCENTRIC-PIXEL-COUNT  algorithm,  the  weights  are 
set  to  Wp  =  1  and  Wp  =  Wq  =  0.  However,  we  have  not  yet  shown 
that  the  integration  algorithm  is  effective  at  disambiguating  multi¬ 
ple  selections,  and  we  will  address  that  now. 

We  conducted  Experiment  4  to  demonstrate,  and  evaluate  em¬ 
pirically,  the  effectiveness  of  the  integration  algorithm  in  a  simple 
disambiguation  scenario.  The  experiment  was  performed  using  two 
subjects.  We  developed  a  dataset  that  requires  a  higher  degree  of 
precision  for  selecting  by  pointing  than  was  necessary  in  the  previ¬ 
ous  experiments.  The  dataset  consists  of  a  wall  with  nine  windows; 
we  placed  a  large  window  in  the  center  and  arranged  eight  smaller 
rim  windows  surrounding  the  center  window.  We  asked  subjects  to 
select  each  of  the  nine  windows  separately,  and  the  order  of  selec¬ 
tion  was  random. 

We  collected  statistics  for  18  selections  of  the  middle  window 
and  31  selections  of  the  outer  windows.  We  analyzed  the  data  con¬ 
sidering  how  different  selection  strategies  could  be  used  for  the 
scenario.  Specifically,  we  considered  how  the  windows  could  be 
selected  using  four  strategies.  The  first  strategy  is  to  simply  point 
at  the  window — “select  object  by  pointing  at  it.”  The  second  strat¬ 
egy  is  to  select  the  largest  window,  no  matter  where  the  user  is 
pointing — “select  largest  object.”  The  third  and  fourth  strategies 
are  combinations  of  the  previous  two,  called  “select  largest  object 
while  pointing  at  it”  and  “select  largest  object  while  not  pointing  at 
it.”  We  analyzed  the  data  considering  these  four  selection  strategies 
and  show  the  results  in  Figure  11. 


Figure  11:  Comparison  between  the  PIXEL-COUNT  and 

BARYCENTRIC-PIXEL-COUNT  algorithms  over  a  range  of  user 
selection  strategies.  The  dataset  contains  small,  distant  objects  that 
are  very  difficult  to  select  by  pointing.  The  noise  in  pointing  is  high 
enough  that  BARYCENTRIC-PIXEL-COUNT  fails  often. 

The  PIXEL-COUNT  algorithm  worked  perfectly  for  selecting  the 
largest  object  across  all  the  selection  cases.  The  most  interesting 
result  is  the  overlapping  case,  “select  largest  object  while  point¬ 
ing  at  it,”  where  the  results  show  the  success  of  the  PIXEL-COUNT 
algorithm  in  comparison  to  the  weaker  performing  B ARYCENTRIC- 
PlXEL-CoUNT  algorithm.  This  result  seems  to  indicate  that  as  the 
precision  required  increases,  the  BARYCENTRIC-PIXEL-COUNT 
algorithm  will  not  be  as  reliable  as  the  PIXEL-COUNT  for  select¬ 
ing  the  largest  item. 

The  difference  in  the  performances  between  the  “select  largest 
object  while  pointing  at  it”  and  the  “select  object  by  pointing  at 
it”  strategies  indicate  that  the  surrounding  objects  may  be  selected 
correctly  more  often  than  the  middle  object.  This  phenomenon  may 
occur  because  the  rim  objects  have  free  space  on  some  of  their  sides. 
If  this  is  the  case,  the  middle  object  has  a  slight  disadvantage  using 
the  “select  object  by  pointing  at  it”  strategy. 

5  Conclusions  and  Future  Work 

We  presented  three  3D  pointing-based  object  selection  techniques, 
Pixel-Count,  Gaussian-Pixel-Count,  and  Barycentric- 
PlXEL-CoUNT,  and  applied  them  to  the  case  of  selection  using  a 
hierarchically-organized  object  database.  We  empirically  demon¬ 
strated  the  general  effectiveness  of  the  BARYCENTRIC-PlXEL- 
COUNT  technique.  However,  there  are  cases  where  that  technique 
fails,  so  we  developed  an  integration  algorithm  to  try  to  leverage 
the  strengths  of  each  technique  by  combining  their  results  using 
one  of  many  weighting  schemes.  We  evaluated  these  schemes  and 
presented  a  careful  analysis  of  the  results.  The  bottom  line  is  that 
different  selection  schemes  work  best  in  different  scenarios,  and 
the  selection  integration  algorithm  can  disambiguate  multiple  se¬ 
lections. 

There  are  several  ways  to  improve  this  work.  First,  we  need  to 
take  advantage  of  the  semantic  information  in  our  database,  for  ex¬ 
ample,  if  it  seems  the  user  is  selecting  a  window,  it  could  be  that  the 
user  is  actually  trying  to  select  an  object  behind  that  window.  Sec¬ 
ond,  we  can  involve  the  user  in  the  selection  process  beyond  simple 
pointing.  Perhaps  the  user  could  manipulate  a  dial  or  similar  con¬ 
troller  to  scroll  though  the  multiple  selections  until  the  correct  ob¬ 
ject  is  selected.  Third,  we  could  determine  the  best  sets  of  weights 
for  certain  common  types  of  databases,  or  even  for  different  areas 
within  a  single  database,  and  establish  “weight  profiles”  to  be  em¬ 
ployed  for  those  databases  or  areas  of  databases.  A  modification 


of  that  idea  is  to  classify  weight  assignments  by  the  situations  in 
which  they  work  best,  and  then  use  that  information  for  develop¬ 
ing  better  performing  automated  selection  techniques  that  adjust  to 
the  situation  on-the-fly  by  swapping  weight  assignments.  Another 
possibility  is  to  have  the  algorithms  learn  from  user  indications  as 
to  whether  the  correct  item  was  chosen  or  not.  An  implementa¬ 
tion  issue  to  address  is  how  to  properly  handle  ties  in  probabilities. 
For  future  studies,  we  would  also  like  to  add  more  subjects  in  the 
experiments. 
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Figure  2:  Operation  of  PIXEL-COUNT  algorithm.  The  scene  is  rendered  into  the  small  selection  frustum  shown  in  the  center  of  the  image. 

Figure  3:  Operation  of  BARYCENTRIC-PIXEL-COUNT  algorithm.  Because  d\  <d2,  each  pixel  of  object  1  will  be  weighted  more  heavily  than  the  pixels  of  object  2. 
Figure  4:  Operation  of  GAUSSIAN-PIXEL-COUNT  algorithm.  Pixels  in  objects  1  and  2  are  weighted  by  a  circularly  symmetric  Gaussian  function  centered  at  Fc. 


same  locations  for  centers  of  objects 


I  1 


u 

IT 

— Wall — 

>  d  df 

no 

Window 

—  Door 

□ 

— — a 

Jt 

same  amount  of  blank  same  sizes  of  objects 
space  between  objects 

c 

D 

A 

B  x 

Building  2 

> 

Building  1 

Figure  6:  Three  test  cases  used  to  determine  the  degree  of  precision  needed  by  the  experiments.  We  varied  the  size  of  the  windows  and  spacing  between  each,  and 
asked  several  users  to  perform  selections  of  the  windows  for  each  test  case.  Statistics  were  collected  to  determine  the  precision  required  for  our  experiments. 

Figure  8:  The  accuracy  (mean  percentage  of  correct  selections)  given  by  the  three  intersection  algorithms  in  Experiments  1-3. 

Figure  9:  The  effect  of  BARYCENTRIC-PIXEL-COUNT  on  selection  involving  a  concave  shape.  The  blue  regions  work  properly,  while  the  red  fail.  Empirical  results 
show  regions  B  and  C  fail  85%  of  the  time. 


Figure  7:  The  dataset  used  in  Experiments  1—3.  Progressively  smaller  subsets  of  the  dataset,  with  different  test  cases,  were  used  in  Experiments  1  and  2.  (Left) 
Ground-level  views;  these  are  the  views  that  subjects  saw  during  the  experiments.  (Right)  Elevated  view;  given  to  provide  the  reader  a  feel  for  the  dataset's  layout. 
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Figure  10:  The  accuracy  of  the  integration  schemes  versus  the  BARYCENTRIC-PIXEL-COUNT  intersection  algorithm  in  Experiments  1—3. 

Figure  11:  Comparison  between  the  PIXEL-COUNT  and  BARYCENTRIC-PIXEL-COUNT  algorithms  over  a  range  of  user  selection  strategies.  The  dataset  contains 
small,  distant  objects  that  are  very  difficult  to  select  by  pointing.  The  noise  in  pointing  is  high  enough  that  BARYCENTRIC-PIXEL-COUNT  fails  often. 


